I originally wrote this post to help me understand and internalize HMMs and the Viterbi algorithm, but school got too busy and I couldn’t finish the post. Still, there’s a fair amount of material here so I decided to publish it anyway. I hope it helps you at least a little bit, as it did for me.
Hidden Markov models (HMMs) are representations of many kinds of real observations in real life. For example, when a baby is crying, you don’t actually know why she’s crying - it could be that she is hungry, or that she is tired, or any other multitude of reasons.
As a parent who is also a machine learning practitioner, you decide to use data to predictably determine why, so that you can soothe them as effectively as possible.
Firstly, you want to establish the parameters of the model. This means establishing 3 kinds of parameters:
Set of (hidden) states
For simplicity’s sake, we’re just going to assume that the baby can only be in states “normal”, “tired”, and “hungry”. These states not directly observable by us, or put another way, they are hidden from us - thus the “hidden” in hidden Markov models. This is what we want to be able to predict.
Set of observable outputs
In this case, the set would consist of crying
and not_crying
. This output is a cause of the hidden state that the baby is in, and is said to be “emitted” from the current state. For each state, there is a chance of emitting a certain output.
Transition probabilities
The transition probabilities is a set of numbers that describe the probability of transiting from any state to another. In this case, this would be described by:
crying not_crying STOP
START 0.2 0.8 0
crying 0.8 0.1 0.1
not_crying 0.4 0.5 0.1
So, given that the baby is currently crying, there’s a 0.8 probability that she will continue to cry, a 0.1 probability that she will stop, and a 0.1 probability that the observed sequence will end (this is when you pass the baby to your spouse and question, for the 124th time, why you became a parent).
Emission probabilities
The emission probabilities is a set numbers that describe the probability of an observable output being “emitted” given a current state.
normal tired hungry
crying 0.1 0.4 0.5
not_crying 0.6 0.2 0.2
So, given that a baby is currently not crying, there is a 0.6 probability that she is feeling okay, a 0.2 probability that she is tired, and a 0.2 probability that she is hungry.
(START
and STOP
represent the start and end of an observed sequence, so an example would look something like START not_crying not_crying crying crying STOP
)
The question is:
If I observed my baby to be not_crying crying crying
, what’s the most likely sequence of states my baby was feeling throughout that period of observation? Could it be normal tired hungry
, or normal tired tired
?
Given all these parameters, we can now conduct supervised learning on the hidden Markov model. The easiest way would be to enumerate through all the permutations and calculate their probabilities.
The more efficient way to do it would be to use Viterbi’s algorithm, which is a dynamic programming algorithm with a time complexity that is linear in the length of the observed sequence and quadratic in the number of possible hidden states.